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ABSTRACT 

This paper reviews the basic elements of the EM 
approach to estimating item parameters and illustrates its use with 
one simulated and one real data set^ In order tr illustrate the use 
of the BILOG computer program, runs for 1-, 2-, and 3-parameter 
models are presented for the two sets of data. First is a set of 
responses from 1,000 persons to five items of the Law School 
Admissions Test. Second is a set of simulated data of 1,000 persons 
to 18 items. The examples bring into focus the degree to which item 
parameters in the 3-parameter model can be recovered. The review 
discusses an EM Algorithm for estimating item parameters; solution 
for it^ta parameters when person ability values are knc^*".; early 
computer program approaches; and the key elements of the Bock-Aitkin 
approach. Further described are extensions of the Bock-Aitkin 
approach, which include: (1) extension of the 3-parameter model; (2) 
prior distributions on item parameters; (3) estimation of the latent 
distribution; and, (4) different patterns of item attempts for 
different persons. (PN) 
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Marginal maximum likelihood equations for estimating the Item parameters In 
the and 2-parameter normal ogive Item response models were Introduced by Bock 
and Aitkin (1981). The Iterative solution of these equations bears strong re- 
semblance to the EM algorithm of Dempster, Laird, and Rubin (1977). Over the 
past year, similar procedures have been Implemented In the BILOG computer pro- 
gram (Bock & Mislevy, 1982) for estimating Item parameters In the 1-, 2-, and 
3-parameter logistic ogive models. Extensions of the original Bock and Aitkin 
solution Include the simultaneous characterization of the latent population dis- 
tribution and the Incorporation of Bayes priors on item parameters, so that 
Bayes modal rather than maximum likelihood estimates may be obtained. 

The purpose of this paper is to review the basic elements of the EM ap- 
proach to estimating item parameters and to illustrate its use with one simulat- 
ed and one real data set. The examples bring into focus a topi; of occasional 
discussion in psychometric circles, namely, the degree to which item parameters 
in the 3-parameter model can be recovered. 

An EM Algorithm for Estimating Item Parameters 

The 3-parameter logistic ogive item response model for dlchotomous test 
items, of which the 1- and 2-parameter models may be considered special cases, 
expresses the probability that person i^ will respond correctly to item ^ as 

= Prob(x^j=l) 



+ (1-G^) ^[A^O^-Bj)] 
+ (1-G^) H'[Aje^ » 



[11 



where 



x^j, the item response, is 1 if correct and 0 if Incorrect; 
V(x) is the cumulative logistic function; 1/[1 + exp(-x)]; 
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G. is the lower asymptote, often called the guessing parameter 
of item J[, identically zero in the 1- and 2-parameter 
models ; 

A J is the slope of item J^, a constant over items in the 1- 

parameter model; 
Bj is the threshold of item j^; 

C J , equal to -^AjBj, is the item intercept, introduced 

because estimation equations for the intercepts are simpler 
than those for item thresholds; and 
is the ability of person 

Given observed responses from N persons to n items, item parameters may 

be estimated. The main problem arising in this endeavor is that except in the 
1-parameter model, the person parameters cannot be eliminated from the maximum 
likelihood estimation equations of the item parameters. In the presence of the 
so^-'called *'nuisance" parameters, the standard results of maximum likelihood the- 
ory (e.g. , consistency) do not apply. 

A Solution Wheu Ability Is Known 

Estimation of item parameters would be straightforward if person ability 
values were known rather than implied by item responses. This is essentially 
the case that obtains in the bioassay setting, \^ere the researcher controls the 
level of treatment dosage to each experimental unit, observes the proportion of 
units exhibiting the targeted response at each dosage level, then estimates an 
hypothesized underlying logistic or normal response function. In anticipation 
of the EM solution for item parameters, likelihood equations are presented for a 
logit regression problem that parallels the psychometric problem. 

Suppose that, as in the bioassay setting, responses to each of ri test items 
are observed from groups of persons at each of ^ specified points along the 
ability scale. Let Njj^ be the number of responses to item from persons with 

ability Xj^ and let R^j^ be the number of these responses that are correct. Un- 
der the usual assumption of local independence, the total likelihood of a col- 
lection of observations of this type is as follows: 

L = n n ^ p (i-p ) [2] 

where 

The likelihood equations for the item parameters are the first derivatives of 
the log of Equation 2, equated to zero: 
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J 

where 



G. : 0 » d-Gj"^ Z (R,, - P„N„VP 



jk 



with 



[5] 
[6] 

[7i 

[8] 



If the vector of zeros that solves these equations is unique and if the matrix 
of second derivatives of the log of Equation 2 is positive definite when evalu- 
ated at these values, then these values are the laaximum likelihood estimates of 
the Item parameters. The second derivatives are 

,2 



[9] 



C, ,A, : z P*. (l-P* ) (G.R,. /P,. - N„,)X. 



J 'J 



jk^" 'jk'""j"jk'^jk "jk'^jk 



[10] 
[111 



■•A^;Aj: EP*,(1-PJ,)(GjRj,/Pj\-Nj,)Xj, 



[12] 



^J'^j'-^'Ik^^-^k^^Jk^Jk/^Jk 



[13] 
[14] 



The solution of the likelihood equations may be accomplished by Newton-Raphson 
iterations, carried out item by item. The t + 1th iteration is 



t+r 



t+1 



j 

A t+1 

a J 



SDRV(CjCj) 


SDRV(AjCj) 


SDRV(CjGj) 


-1 


FDRV(Cj) 


SDRV(C,A,) 
J J 


SDRV(AjAj) 


SDRV(AjGj) 




FDRV(Aj) 


SDRV(CjGj) 


SDRV(AjG^) 


SDRV(GjGj) 




FDRV(Gj) 



[15] 
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vjhere all first and second derivatives are evaluated at the stage t_ estimates of 
the Item parameters. 

An Earlier Approach to the Problem 

In the bloassay setting, where the criterion (dosage level) Is known, the 
preceding solution Is correct. One approach to the psychometric setting, where 
the criterion (ability) Is not known, Is to replace the unknown ability parame- 
ters with provisional estimates. This approach Is employed by computer programs 
such as LOGOG (Kolakowskl & Bock, 1973), LOGIST (Wood, Wlngersky, & Lord, 1977), 
and BICAL (Wright & Mead, 1978). LOGOG, for example, employs for the 2-paramet- 
er model an algorithm similar to one outlined below: 

1. Use persons' loglts of percent correct as provisional ability estl- 

2. Standardize provisional ability estimates. 

3. On the basis of provisional ability estimates, form groups of persons 
with apparently similar abilities. 

4. Assuming all persons In a group have the same true ability — the mean of 
their provisional estimates — solve Equations 4 and 5 to estimate Item 
parameters. 

5. Using provisional Item parameter estimates, re-estlmate person abili- 
ties. 

6. Return to Step 2. 

Cycles of this type were repeated until convergence was attalniad— vhlcli, it was 
learned, became less likely as the number of items and/or persons decreased. A 
major problem is the unreliability of the estimates of person ability when the 
number of items was small; in such cases, person ability estimates were a poor 
substitute for the true values. 

Key Elements of the Bock-Altkin Approach 

An alternative does exist, however — an alternative that derives from long- 
standing procedures in the statistical literature Ir general and from an honora- 
ble tradition in psychometrics in particular (e.g., Kelley's paradox). The idea 
is this: Suppose that persons can be thought of as a random sample from a popu- 
lation in which ability is distributed in accordance with a distribution g(e)« 
Although each person's response vector x^ may not contain very much information 

about that person, it contains information about ^* Taken together, the data of 
all persons may be sufficient to produce a fairly good characterization of ^, 
which, in turn, may be used to condition and improve the inference about any 
individual person. 

Now if ^ is a smooth distribution with finite moments. It may be approxi- 
mated to any desired degree of accuracy by a discrete distribution over a finite 
number of points, i.e., a histogram. Let Xj^, for k « 1, q, be the points 

and let A(Xj^) be the densities at those points. By Bayes theorem, the posterior 

density of e, given the response vector of person i is obtained as 




k=l. 



. . . ,q 



[16] 



s 
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Application to the estimation of Item parameters Is accomplished In the 
algorithm outlined below: 

!• Using provisional estimates of Item parameters, compute via Equation I 
the likelihood of each person's response patvsrn at each of the points, 
namely, P(x^|Xj^). 

2. Using given values (Bock & Aitkin, 1981) or provisional estimates (see 
below) of the densities A(Xj^) at each of the points, compute via Equa- 
tion 16 the posterior probability that the ability of person 1^ Is Xj^. 

3. (E-Step) Pseudo-counts of numbers of Items attempted and number of 
Items correct are then obtained by effectively distributing the data 
from each person over the points In proportion to the likelihood of 
his/her being there as follows: 



= E d 
1 



A(x^^) 

ij ?P(xjy A(X^) 



[171 



and 



^jk = I hi \i V 



= I hi \i Z P(x^lX^) A(X^) 
s 

where d^j is 1 if person 1^ was presented item ^ and 0 if not. 

4. (M-step) The maximum likelihood equations for the item parameters. 
Equations 4 through 6, are then solved with respect to the pseudo- 
counts. 

5. Unless item parameters are unchanged from the previous cycle, return to 
Step 1. 

Bock and Aitkin (1981) showed that for given £, this procedure provides 
item parameter estimates that solve the marginal maximum likelihood equation 

P (data I item parameters) =» n P(x ) 

1 

=» / n P(x Jo) g(0) do [19] 

0 1 

The problem with the "nuisance" ability parameters has been solved by inte- 
grating over their range, rather than by replacing them with estimates as in 
LOGOG or conditioning them away as is possible with the 1-parameter model only. 
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As a result » the unreliability in the ability estimate for a person has been 
ameliorated. Rather than basing the estimation of itea parameters on a larger 
number of unreliable person ability estimates, they have been based on the much 
more stable estimates of population densities at various points along the abili- 
ty scale and expected proportions of correct response at those points. 

Extensions of the Bock-Aitkin Approach 

The basic approach to estimating item parameters outlined above was shown 
by Bock and Aitkin to be a maximum likelihood solution under the conditions of 
(1) the 1- and 2-parameter normal ogive model, (2) all persons being adminis- 
tered the same set of items, and (3) the weights A(Xj^) remaining fixed through- 
out the solution, i.e.^ persons were in effect assumed to be a random sample 
from a known distribution. (By comparing item parameter estimates obtained with 
different priors on ability, this latter assumption was shown to be relatively 
unimportant; the item parameters varied little in the examples shown.) Since the 
publication of the article, progress has continued in the investigation of this 
approach. A number of extensions have been incorporated into the BILOG program. 

Extension to the 3-parameter model. Along with the change to the logistic 
rather than to the normal ogive response curve, the .provision for obtaining item 
parameter estimates in the 3-parameter model has been included. It is known 
that item parameter estimation in this model has been problematic. Certain im- 
provement is achieved in the EM approach by the use of the estimation of provi- 
sional densities and probabilitiea at selected points rather than of person 
abilities, since proper estimates always exist jEor the former but not necessari- 
ly for the latter in the 3-parameter model. Difficulties remain, however, from 
another source; The matrix of second derivatives of the log likelihood function 
is often poorly conditioned in the 3-parameter model. The inversion of this 
matrix, required in the Newton-Raphson solution of Equations 4 through 6, can 
become unstable. This practical problem at least partly motivates the extension 
discussed immediately below. 

Prior distributions on item parameters . In order to provide for stable and 
"reasonable" item parameter estimates in the 3-parameter model and in all models 
for small samples of persons, provision has been made for the incorporation of 
prior distributions on item parameters. For lower asymptotes, beta priors are 
employed; for slopes, iog-normal; for intercepts, normal. (Priors are rarely 
necessary for intercepts; provision is made to facilitate linking studies, since 
the prior distribution of a given parameter may be based on a previous estimate 
and its standard error). The program provides Bayes modal estimates rather than 
maximum likelihood estimates when priors are used. Un..orrelated priors are as- 
sumed, thereby effecting a modification of the first derivatives Equations 4 
through 6 by a so-called "penalty" function and the addition to the second 
double derivatives Equations 9, 12, and 14 of an augment.ing term. The terms 
added to the diagonal of the matrix ccond derivatives Improve conditioning 
of this matrix. Solutions may be obtto-^ufid from any data set with the imposition 
of sufficiently strong priors on the item parameters, though judicious and 
thoughtful choice of priors is recommended. 

Estimation of tfa latent distribution. The original Bock-Aitkin solution 
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assumes that persons 3re drawn from a specified distribution, normal or other- 
wise. The program now allows for the simultaneous estimation of the latent dis- 
tribution if the user prefers. This is accomplished by revising the weights 
A(Xj^) at the beginning of each iteration as follows: 

^(t+l)(j^) = (1/N) E P^^\\|x^) 

.1 , ^'^i'V 120, 
N i Z A^^^Xg) 
s 

The distribution is then restandardized to set the scale and location of the 
latent ability variable. Under this convention, a common slope parameter is 
estimated in the 1-parameter model while the standard deviation of the latent 
distribution is fixed at one; this is equivalent to the more typical practice of 
fixing all slopes at one but not restricting the ability parameters. 

Different patterns of item attempts for different persons. As seen in 
Equations 17 and 18, there is no necessity of assuming that all persons are pre- 
sented the same items. This feature is of particular value in the assessment 
setting because item parameters may be estimated from data gathered in highly 
efficient multiple-matrix sampling designs where each person responds to only 
one to five items in a scale. Despite the spars ity of data for each person pro- 
scribing the estimation of his/her ability, it is no barrier to iteratively 
building up the estimates of population densities and item proportions correct 
at the points Xj^. Persons with few responses are spread more broadly and per- 
sons with more responses are spread less broadly, each in accordance with the 
information conveyed by his/her response pattern. 

Examples 

In order to illustrate the use of the BILOG program, runs for 1-, 2-, and 
3-parameter models are presented for two sets of data* First is a set of re- 
sponses from 1,000 persons to five items of the Law School Admissions Test 
(LSAT), a data set which has been analyzed in the past by Bock and Lieberman 
(1970), Bock and Aitkin (1981), Andersen (1973), Andersen and Madsen (1977), and 
Thissen (1982). These data have been found to be well fit by a 1-parameter lo- 
gistic item response model and a normal distribution of ability. Second is a 
set of simulated data of 1,000 persons to 18 items. The known parameters of the 
items, which include lower asymptotes, may be compared with the estimated val- 
ues. 

Example 1: LSAT 

The five items of the LSAT analyzed by Bock and Lieberman in 1970 and oth- 
ers since were, on the whole, rather easy for the persons in the sample; about 
30% of the examinees answered all five items correctly. It has been found by 
Andersen (1973) that the data are well fit by a 1-parameter logistic ogive model 
and an underlying normal distribution of ability. These data were subjected to 
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item analysis via the 2-, and 3-parameter logistic models with BILOG, all 
under the assumption of an underlying normal distribution. 

Table 1 presents the resulting item parameter estimates and, for the 1- and 
2-parameter solutions, a likelihood ratio test of fit against a general multi- 
nomial alternative (see Bock & Aitkin, 1981). A straight maximum likelihood 
solution could not be obtained for the 3-parameter model, so the solution shown 
incorporates weak prior distributions on both slopes and asymptotes. The slopes 
had log normal ptior distributions with means of zero (i.e., slopes of one) and 
standard deviations of two (slope values corresponding to a range of two stan- 
dard deviations would be ,018 and 54.598); asymptotes had a beta prior with pa- 
rameters 1.25 and 5.75 (roughly comparable to saying with the weight of five 
observations that the asymptotes were .05). The formula for the likelihood ra- 
tio test was applied to the 3-parameter solution, but it must be noted that its 
distribution is not chi-square because the parameter estimates are modes of 
posteriors, not maximums of the likelihood function; its value, gauged in com- 
parison with the degrees of freedom appropriate to a true maximum likelihood 
solution for the 3-paiameter model, may be considered a somewhat more conserva- 
tive index of fit. 



Table 1 

LSAT Item Parameter Estimates 



Model Chi-Square 


df 


Item 


Threshold 


Slope 


Asymptote 


1-P 9.90 


19 


1 


-3.482 


.788 


.000 






2 


-1.270 


.788 


.000 






3 


-0.305 


.788 . 


.000 






4 


-1.659 


.788 


.000 






5 


-2.664 


.788 


.000 


2-P 7.74 


12 


1 


-3.318 


.836 


.000 






2 


-1.356 


.731 


.000 






3 


-0.279 


.891 


.000 






4 


-1.845 


.697 


.000 






5 


-3.074 


.669 


.000 


3-P 9.27 


7 


1 


-3.217 


.831 


.049 






2 


-1.176 


.752 


.048 






3 


-0.127 


1.207 


.029 






4 


-1.704 


.694 


.048 






5 


-3.114 


.624 


.050 



It is no surprise to see that the 1-parameter model fits the data well and 
that the 2-parameter model fits even better but not sufficiently better to 
justify the additional parameters estimated. As noted by Ihissen (1982), the 
1-parameter solution agrees (after rescaling) with Andersen's conditional maxi- 
mum likelihood solution (Andersen, 1973). 

It is somewhat of a surprise to see that the 3-parameter solution appears 
to fit poorer than the 2-parameter solution, but this is because a maximum like- 
lihood solution was not attained; the resulting parameter estimates depend not 
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only on the data but on the priors. Bock and Lieberman (1970), estimating in- 
tercepts and slopes for different fixed values of asymptotes, found that asymp- 
totes of zero did indeed fit best. It may be seen from the estimates of -asymp- 
totes that the only item which shows much difference from the prior is that of 
Item 3 — the only item sufficiently difficult to provide much information about 
an asymptote. For this item, the information pushes the asymptote value down in 
the direction of zero. 

Example 2: Simulated Data 

Responses were generated from a random sample of 1,000 simulated examinees 
from a standard normal distribution to an 18-iten test, in accordance with a 
3-parameter logistic ogive item response model. The generating item parameters 
are shown in Table 2. There are essentially two groups of nine items each. In 
the first group, all slopes are 2.0 and all lower asymptotes are .05; thresholds 
range from -2.0 to +2.0 in increments of .5. In the second group, all slopes 
are 2.0 and all asymptotes are .25; thresholds again range from *-2.0 to +2.0 in 
increments of .5. The broad range of difficulty of the items is reflected in 
their resulting proportion-correct values, which ranged from .11 to .96 correct. 
Item-test biserials ranged from .4 to .8. 



Table 2 



Generating Values of Item 


Parameters 




for Simulated Data Example 


Item 


Threshold 


Slope 


Asymptote 


1 


-2.00 


2.00 


.05 


2 


-1.50 


2.00 


.05 


3 


-1.00 


2.00 


.05 


4 


-0.50 


2.00 


.05 


5 


0.00 


2.00 


.05 


6 


0.50 


2.00 


.05 


7 


1.00 


2.00 


.05 


8 


1.50 


2.00 


.05 


9 


2.00 


2.00 


.05 


10 


-2.00 


2.00 


.25 


11 


-1.50 


2.00 


.25 


12 


-1.00 


2.00 


.25 


13 


-0.50 


2.00 


.25 


14 


0.00 


2.00 


.25 


15 


0.50 


2.00 


.25 


16 


1.00 


2.00 


.25 


17 


1.50 


2.00 


.25 


18 


2.00 


2.00 


.25 



BIL06 solutions for the 1-, 2-, and 3-parameter models are shown In Table 
3. The 1- and,.2-parameter solutions are straight maximum likelihood solutions, 
with the normal distribution of persons assumed. The 3-parameter solution re- 
quired priors on all item parameters, the specification of which Is described In 
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Table 3 

Ic«« FartacCtr EtClMte« for Si»iil«Ced D«C« 
for tht 2-, «nd S-rAraaeCer Hodftle 



Icen 



Intcr- 
ctpc SE 



Slope SE 



Thrcth- 

old SE 



Di«per- 
«ion SE 



A» yap- 
tote SE 



Chi- 

Squtre df 



Frob 



l-Paraaeter Kodel 



I 


-3.632 


.141 


1.197 


.015 


-3.035 


.141 


.836 


.011 


.0 


.0 


7.7 


9 


. 5640 


1 


-3.324 


.117 


1.197 


.015 


-2.777 


.117 


.836 


.01 1 


.0 






Q 


nni Q 


3 


-2.083 


.089 


1.197 


.015 


-1 . 74 1 


.089 


.836 


.011 


.0 


• 0 


29.3 


9 


.0006 


4 


-1.415 


.081 


1.197 


.015 


-1.162 


.082 


.836 


.011 


.0 


.0 


46. 1 


9 


.0000 


5 


"0.384 


.074 


1.197 


.015 


0.320 


.075 


.836 


.011 


.0 


.0 


20.3 


9 


.0161 


6 


0.391 


.078 


1.197 


.015 


0.327 


.079 


.836 


.011 


.0 


.0 


39.8 


9 


.0000 


7 


1.272 


.084 


1.197 


.015 


1.063 


.085 


.836 


.011 


.0 


.0 


25.7 


9 


. 0024 


6 


1.885 


.095 


1.197 


.015 


1.575 


.096 


.836 


.011 


.0 


.0 


8.5 


9 


.4840 


f 


2.196 


.105 


1.197 


.015 


1.835 


.105 


.836 


.011 


.0 


.0 


29.9 


9 


.0005 


10 


-4.603 


.170 


1.197 


.015 


-3.847 


.170 


.836 


.011 


.0 


.0 


30.0 


9 


.0005 


II 


-2.867 


.109 


1.197 


.015 


-2.396 


.110 


.836 


.011 


.0 


.0 


4.2 


9 


OtkC 1 

. o9ol 


12 


-2.619 


.100 


1.197 


.015 


-2.188 


.101 


.836 


.011 


.0 


.0 


23.9 


9 


.0046 


13 


-1.616 


.081 


1.197 


.015 


-U350 


.082 


.836 


.011 


.0 


.0 


19.7 


9 


.0196 


14 


-0.818 


.072 


1.197 


.015 


-0.684 


.073 


.836 


.011 


.0 


.0 


20.7 


9 


.0142 


15 


-0.301 


.070 


1.197 


.015 


H).251 


.070 


.836 


.011 


.0 


.0 


26.5 


9 


.0018 


16 


0.275 


.071 


1.197 


.015 


0.230 


.072 


.836 


.011 


.0 


.0 


28.6 


9 


.0008 


17 


0.669 


.071 


1.197 


.015 


0.559 


.072 


.836 


.011 


.0 


.0 


57. 1 


9 


.0000 


18 


0.837 


.073 


1.197 


.015 


0.700 


.073 


.836 


.011 


.0 


.0 


56.6 


9 


.0000 


All 


Items 




















501^5 


162 


.0000 


2-FAraaeter Hodel 




















8 


.8821 


1 


-3.587 


.142 


1.513 


.116 


-2.368 


.149 


.660 


.050 


.0 


.0 


3.7 


2 


-3.341 


.121 


2.008 


.105 


-1.664 


. 121 


.498 


.026 


.0 


.u 


Q C 
0. J 


Q 
0 




3 


-1.982 


.092 


1.922 


.105 


-1.032 


.095 


.520 


.029 


.0 


.0 


13.5 


8 


.0958 


A 


-1.332 


.087 


2.168 


.122 


-0.614 


.089 


.461 


.026 


.0 


.0 


19.5 


8 
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greater detail below. The indices of goodness of fit that accompany the esti- 
mates are not true likelihood chi*^quares, but approximations based on combining 
persons into 10 homogeneous groups on the basis of their Bayes ability esti- 
mates. Counts of correct responses observed in each group were then compared 
with those expected under the assumption that all persons in a group have the 
same true ability. 

The 1-paraaeter solution exhibits biases in both thresholds and slopes, as 
compared with the generating values. Although all items have the sam3 generat- 
ing slope of 2.0, the common value estimated is only 1.2, due to the attenuation 
caused by the nonzero lower asymptotes. There is a tendency for difficult items 
to fit more poorly than easy items, and for items of the second group (with 
higher asymptotes) to fit more poorly than items of the first group (with lower 
asymptotes). 

The 2-parameter solution represents a marked improvement in fit. Many 
items, particularly easier items, are well explained by this solution. Serious 
biases are apparent, however, in the slope estimates. Again, because of the 
nonzero lower asymptotes, slopes are consistently underestimated to a degree 
that increases with difficulty and with the values of the asymptote itself. 

The 3-parameter solution represents another, though less impressive, im- 
provement in fit; the solution required prior distributions on intercepts, 
slopes, and asymptotes. Normal priors with mean zero and standard deviation two 
were placed on all intercepts. It may be seen that the data effectively domi- 
nated the prior in this case, as considerable information about intercepts is 
present in the data. Log-normal priors with mean .588 and standard deviation 
.500 were placed on slopes; this corresponds roughly to a prior mean of 1.8 and 
a standard deviation of 1.0 for the slopes themselves, suggesting a prior belief 
that slopes would probably range between about .5 and 5.0. Beta priors with 
parameters (3.5, 47.5) were placed on asymptotes for the first 11 items and with 
parameters (11, 41) for the last 7 items; this corresponds to saying with the 
weight of 50 observations that the asymptotes were .05 for the first 11 items 
and .20 for the last 7. These values were obtained by inspecting plots of the 
residuals from the 2-parameter solution, as illustrated by Figure 1. 

Although the 3-parameter solution provides an adequate fit to the data, 
with, a chi-square ratio less than one and a half, discrepancies remain between 
final parameter estimates and generating values. For the second group of items 
in particular, both thresholds and slopes tend to be too low. The apparent par- 
adox of adequate fit but imperfect recovery of item parameters is resolved at 
least partially by an examination of estimated and observed response curves. 
Figure 1 plqts data for Item 16 under the 2-parameter solution; Figure 2 plots 
the data for the same item under the 3-parameter solution. Despite nontrivial 
differences in estimates of item parameters (.5 vs. .7 for threshold, .9 vs. 1.8 
for slope, 0 /s. .16 for asymptote), both curves are able to explaiii observed 
proportions of correct 'response in the region where the majority of persons are 
to be found. Despite the differences in their parameters, the 2- and 3-paramet- 
er curves are not very different with respect to the data at hand. 
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Figure 1 

Observed and Expected 2-Parameter Logistic Response Curve for Item 16 
(Smooth Line is Fitted Response Curve; "X" Represents Proportion Correct of a 
Group of Persons with Approximately Similar Abilities; Vertical Bars around 
Curve Represent Two Standard Errors around Expected Group Proportions Correct) 
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Figure 2 

Observed and Expected 3-Parameter Logistic Response Curve for Item 16 
(Smooth Line is Fitted Response Curve; "X" Represents Proporti)n Correct of a 
Group of Persons with Approximately Similar Abilities; Vertical Bars around 
Curve epresent Two Standard Errors around Expected Group Proportions Correct) 
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Discussion 

With the use of marginal maximum likelihood estimation procedures and prior 
distributions on Item parameters, It is now possible to estimate Item response 
curves under the 2-, and 3-T)arameter logistic models from even very sparse 
data sets. It will be noted that the emphasis here Is on the estimation of re- 
sponse curves rather than on item parameters. Simulation studies suggest that 
the recovery of generating item parameters is problematic, even with large num- 
bers of items and persons, when the parameters of an item are not well identi- 
fied by the calibration sample. These circumstances seem to obtain quite fre- 
quently with the 3-parameter model and, occasionally, with the 2-parameter model 
when the calibration sample does not span a sufficiently broad range of ability. 
Item response curves are estimated that do, on the other hand, explain the data 
satisfactorily. 

The explanation of these findings is that for typical educational tests, 
data are well explained by a region of values in the parameter space. For an 
easy item, for example, data at hand may be well explained by either a 2- or a 
3-parameter ogive; curves of each type can be found that are virtually identical 
in the region of the ability scale where the calibration examinees are to be 
found. The use of weak prior distributions will function in this situation to 
keep the resulting parameter estimates "reasonable," or in line with the values 
that the substantive interpretations of the item parameters would suggest (e.g., 
item slopes ranging between, say, 0 and 4) and asymptotes ranging between, say, 
0 and .25). 

The practical implication of this result is that the substantive interpre- 
tation of item parameters in the 3-parameter model (and, to a lesser extent, the 
2-parameter model as well) may not always be justif-ed. Maximum likelihood es- 
timates for a given item may differ substantially from another set of values 
that reproduce the calibration da'.a nearly as well. Discussion of item charac- 
teristics could be couched in terms of the item Information function instead, 
since all sets of item parameter estimates in the "solution space" will yield 
similar information functions in the region where the data lies. Characteris- 
tics such as the point of maximum information and the value of the information 
function at that point can be expected to be much more stable than the item pa- 
rameter estimates themselves. 

Fortunately, most applications of IRT depend on the shape and location of 
response curves rather than the parameter values, partlcu:!.arly when applications 
are foreseen for examinees who are typical of the calibration sample. The esti- 
mation of an individual's ability from a given response pattern would typically 
be similar if computed from any item parameter values that produce similar re- 
sponse curves in the neighborhood of his/her ability. Discrepancies would be 
more likely for persons with abilities that are extreme. 

One application that demands special attention, however, is vertical equat- 
ing, or the linking of tests across broad ranges of ability — often across sever- 
al grades or age groups. One approach to the equating problem is to calibrate 
tests separately in the low and high ability groups, say, and then to attempt to 
find the linear transformation that produces the closest match of item parameter 
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estimates for those items that were adndnistered to *^th groups. Now, a linking 
item will tend to be comparatively easy for the high ability group and compara- 
tively difficult for the low ability group. This means that the range of abili- 
ty for v^ich its response curve is well estimated in either group does not cover 
the region where the groups overlap, i.e., "lere the two estimated curves are 
supposed to be made to match. Poor linking may result as an artifact of the 
multicollinearity of item parameter estimates. The information needed for a 
proper link is found in not just the item parameter estimates and their standard 
errors, but in the matrix of correlations among the estimates as well. (This 
problem may be avoided by calibrating all items together with responses from all 
groups simultaneously, an option available in both BILOG and LOGIST.) 
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